NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Quicksand: Harnessing Stranded Datacenter Resources with Granular Computing

Ruan, Zhenyuan; Li, Shihang Li; Fan, Kaiyan; Park, Seo Jin; Aguilera, Marcos K; Belay, Adam Belay; Schwarzkopf, Malte (April 2025, NSDI'25 (22nd USENIX Symposium on Networked Systems Design and Implementation). USENIX Association.)

Datacenters today waste CPU and memory, as resources demanded by applications often fail to match the resources available on machines. This leads to stranded resources because one resource that runs out prevents placing additional applications that could consume the other resources. Unusable stranded resources result in reduced utilization of servers, and wasted money and energy. Quicksand is a new framework and runtime system that unstrands resources by providing developers with familiar, high-level abstractions (e.g., data structures, batch computing). Internally Quicksand decomposes them into resource proclets, granular units that each primarily consume resources of one type. Inspired by recent granular programming models, Quicksand decouples consumption of resources as much as possible. It splits, merges, and migrates resource proclets in milliseconds, so it can use resources on any machine, even if available only briefly. Evaluation of our prototype with four applications shows that Quicksand uses stranded resources effectively; that Quicksand reacts to changing resource availability and demand within milliseconds, increasing utilization; and that porting applications to Quicksand requires moderate effort.
more » « less
Free, publicly-accessible full text available April 28, 2026
Harvesting Idle Memory for Application-managed Soft State with Midas

Qiao, Yifan; Ruan, Zhenyuan; Ma, Haoran; Belay, Adam; Kim, Miryung; Xu, Harry (April 2024, 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI'24))

Many applications can benefit from data that increases performance but is not required for correctness (commonly referred to as soft state). Examples include cached data from backend web servers and memoized computations in data analytics systems. Today's systems generally statically limit the amount of memory they use for storing soft state in order to prevent unbounded growth that could exhaust the server's memory. Static provisioning, however, makes it difficult to respond to shifts in application demand for soft state and can leave significant amounts of memory idle. Existing OS kernels can only spend idle memory on caching disk blocks—which may not have the most utility—because they do not provide the right abstractions to safely allow applications to store their own soft state. To effectively manage and dynamically scale soft state, we propose soft memory, an elastic virtual memory abstraction with unmap-and-reconstruct semantics that makes it possible for applications to use idle memory to store whatever soft state they choose while guaranteeing both safety and efficiency. We present Midas, a soft memory management system that contains (1) a runtime that is linked to each application to manage soft memory objects and (2) OS kernel support that coordinates soft memory allocation between applications to maximize their performance. Our experiments with four real-world applications show that Midas can efficiently and safely harvest idle memory to store applications' soft state, delivering near-optimal application performance and responding to extreme memory pressure without running out of memory.
more » « less
Full Text Available
Harvesting Idle Memory for Application-managed Soft State with Midas

Qiao, Yifan; Ruan, Zhenyuan; Ma, Haoran; Belay, Adam; Kim, Miryung; Xu, Harry (April 2024, USENIX Association)

Full Text Available
Harvesting Idle Memory for Application-managed Soft State with Midas

Qiao, Yifan; Ruan, Zhenyuan; Ma, Haoran; Belay, Adam; Kim, Miryung; Xu, Harry (April 2024, USENIX Association)

Full Text Available
Harvesting Idle Memory for Application-managed Soft State with Midas

Qiao, Yifan; Ruan, Zhenyuan; Ma, Haoran; Belay, Adam; Kim, Miryung; Xu, Harry (April 2024, USENIX Association)

Full Text Available
Harvesting Idle Memory for Application-managed Soft State with Midas

Qiao, Yifan; Ruan, Zhenyuan; Ma, Haoran; Belay, Adam; Kim, Miryung; Xu, Harry (April 2024, USENIX Association)

Full Text Available
Nu: Achieving Microsecond-Scale Resource Fungibility with Logical Processes

Ruan, Zhenyuan; Park, Seo Jin; Aguilera, Marcos K.; Belay, Adam; Schwarzkopf, Malte (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI'23))

Datacenters waste significant compute and memory resources today because they lack resource fungibility: the ability to reassign resources quickly and without disruption. We propose logical processes, a new abstraction that splits the classic UNIX process into units of state called proclets. Proclets can be migrated quickly within datacenter racks, to provide fungibility and adapt to the memory and compute resource needs of the moment. We prototype logical processes in Nu, and use it to build three different applications: a social network application, a MapReduce system, and a scalable key-value store. We evaluate Nu with 32 servers. Our evaluation shows that Nu achieves high efficiency and fungibility: it migrates proclets in ≈100μs; under intense resource pressure, migration causes small disruptions to tail latency—the 99.9th percentile remains below or around 1ms—for a duration of 0.54–2.1s, or a modest disruption to throughput (<6%) for a duration of 24–37ms, depending on the application.
more » « less
Full Text Available
Unleashing True Utility Computing with Quicksand

https://doi.org/10.1145/3593856.3595893

Ruan, Zhenyuan; Li, Shihang; Fan, Kaiyan; Aguilera, Marcos K.; Belay, Adam; Park, Seo Jin; Schwarzkopf, Malte (June 2023, ACM)

Today's clouds are inefficient: their utilization of resources like CPUs, GPUs, memory, and storage is low. This inefficiency occurs because applications consume resources at variable rates and ratios, while clouds offer resources at fixed rates and ratios. This mismatch of offering and consumption styles prevents fully realizing the utility computing vision. We advocate for fungible applications, that is, applications that can distribute, scale, and migrate their consumption of different resources independently while fitting their availability across different servers (e.g., memory at one server, CPU at another). Our goal is to make use of resources even if they are transiently available on a server for only a few milliseconds. We are developing a framework called Quicksand for building such applications and unleashing the utility computing vision. Initial results using Quicksand to implement a DNN training pipeline are promising: Quicksand saturates resources that are imbalanced across machines or rapidly shift in quantity.
more » « less
Full Text Available
Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony

Qiao, Yifan; Wang, Chenxi; Ruan, Zhenyuan; Belay, Adam; Lu, Qingda; Zhang, Yiying; Kim, Miryung; Xu, Harry (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation)

Full Text Available
Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony

Qiao, Yifan; Wang, Chenxi; Ruan, Zhenyuan; Belay, Adam; Lu, Qingda; Zhang, Yiying; Kim, Miryung; Xu, Quoqing Harry (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI'23))

Remote memory techniques are gaining traction in datacenters because they can significantly improve memory utilization. A popular approach is to use kernel-level, page-based memory swapping to deliver remote memory as it is transparent, enabling existing applications to benefit without modifications. Unfortunately, current implementations suffer from high software overheads, resulting in significantly worse tail latency and throughput relative to local memory. Hermit is a redesigned swap system that overcomes this limitation through a novel technique called adaptive, feedback-directed asynchrony. It takes non-urgent but time-consuming operations (e.g., swap-out, cgroup charge, I/O deduplication, etc.) off the fault-handling path and executes them asynchronously. Different from prior work such as Fastswap, Hermit collects runtime feedback and uses it to direct how asynchrony should be performed—i.e., whether asynchronous operations should be enabled, the level of asynchrony, and how asynchronous operations should be scheduled. We implemented Hermit in Linux 5.14. An evaluation with a set of latency-critical applications shows that Hermit delivers low-latency remote memory. For example, it reduces the 99th percentile latency of Memcached by 99.7% from 36 ms to 91 µs. Running Hermit over batch applications improves their overall throughput by 1.24× on average. These results are achieved without changing a single line of user code.
more » « less
Full Text Available

« Prev Next »

Search for: All records